ROCm 與 HIP：詳盡的十章教程：平行轉換：將順序邏輯映射至 GPU 線程

這平行轉換代表了計算哲學的根本轉變，從 時間序列 （一件接一件執行）轉為 空間分佈 （在整個網格上同時執行所有操作）。

這是 GPU 計算的黃金法則： 「只要你的問題是『對 N 個元素獨立地執行某項操作』，這就是你應該首先嘗試的映射方式。」 這種資料平行方法是 GPU 加速中最容易實現的部分，線程管理的開銷遠遠小於並行處理的巨大吞吐量。

HIP 核心通常處理大型基本類型陣列。在高性能圖形與機器學習中，我們常使用 float （單精度），而需要極高數值穩定性的科學模擬則使用 double （雙精度）。

在 CPU 程式碼中，處理器透過迴圈「訪問」資料。在 GPU 邏輯中，資料則「佔據」一個線程。你不再撰寫 如何迴圈 而是開始撰寫 特定座標下單一工作人員應執行的動作。

$$\text{索引 } i = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary heuristic for deciding if a problem is suitable for the 'Parallel Pivot'?

The problem requires complex recursion.

The problem involves applying an operation independently to N elements.

The problem must be solved in a strict temporal order.

The problem uses only integer arithmetic.

QUESTION 2

In the context of the Parallel Pivot, what does the term 'Occupation' refer to?

The CPU visiting each index in a for-loop.

How many blocks are currently queued in the GPU.

Data 'occupying' a specific thread at a specific coordinate.

The percentage of memory used by the float arrays.

QUESTION 3

Which data types are most commonly handled by HIP kernels for high numerical stability in science?

bool and char

int and long

float and double

void and pointer

QUESTION 4

When pivoting a loop into a kernel, what replaces the loop counter `i`?

The return value of the function.

A global thread identity calculated from grid/block dimensions.

The hipMalloc address.

The host-side iteration variable.

QUESTION 5

Fill in the blank: To ensure production reliability even in basic kernels, you must ______.

Only use float types.

Add explicit error-checking macros everywhere.

Use a single thread per block.

Avoid all boundary checks.